62 research outputs found

    Transfer learning of language-independent end-to-end ASR with language model fusion

    Full text link
    This work explores better adaptation methods to low-resource languages using an external language model (LM) under the framework of transfer learning. We first build a language-independent ASR system in a unified sequence-to-sequence (S2S) architecture with a shared vocabulary among all languages. During adaptation, we perform LM fusion transfer, where an external LM is integrated into the decoder network of the attention-based S2S model in the whole adaptation stage, to effectively incorporate linguistic context of the target language. We also investigate various seed models for transfer learning. Experimental evaluations using the IARPA BABEL data set show that LM fusion transfer improves performances on all target five languages compared with simple transfer learning when the external text data is available. Our final system drastically reduces the performance gap from the hybrid systems.Comment: Accepted at ICASSP201

    COMPARISON OF FOOT MORPHOLOGY AND PREFERRED SHOE FOR IMPROVING RUNNING SHOE FITTING

    Get PDF
    The purpose of this study was to compare the shoe internal space and foot shapes of different type for increasing sense of shoe fitting. 347 healthy subjects (male=160; female=187) without any pathological conditions of the foot participated in this study. 11 pairs of running shoes have different size (230-280mm) with same material and appearances were prepared and the shapes of shoe last were also measured for these shoes. In order evaluating the sense of shoe fitting, 6 fit indicators were analysed by comparing the shape of shoe last with foot morphology. We could find that people with wider feet tend to wear tighter shoes and narrower feet preferred to wear looser shoes that seems to significantly affect by the experience. And the sense of shoe fitting was significant different from gender and foot type which can be used as important data for recommending shoe size and to make customized shoe

    Improved Multi-Shot Diffusion-Weighted MRI with Zero-Shot Self-Supervised Learning Reconstruction

    Full text link
    Diffusion MRI is commonly performed using echo-planar imaging (EPI) due to its rapid acquisition time. However, the resolution of diffusion-weighted images is often limited by magnetic field inhomogeneity-related artifacts and blurring induced by T2- and T2*-relaxation effects. To address these limitations, multi-shot EPI (msEPI) combined with parallel imaging techniques is frequently employed. Nevertheless, reconstructing msEPI can be challenging due to phase variation between multiple shots. In this study, we introduce a novel msEPI reconstruction approach called zero-MIRID (zero-shot self-supervised learning of Multi-shot Image Reconstruction for Improved Diffusion MRI). This method jointly reconstructs msEPI data by incorporating deep learning-based image regularization techniques. The network incorporates CNN denoisers in both k- and image-spaces, while leveraging virtual coils to enhance image reconstruction conditioning. By employing a self-supervised learning technique and dividing sampled data into three groups, the proposed approach achieves superior results compared to the state-of-the-art parallel imaging method, as demonstrated in an in-vivo experiment.Comment: 10 pages, 4 figure

    Scan Specific Artifact Reduction in K-space (SPARK) Neural Networks Synergize with Physics-based Reconstruction to Accelerate MRI

    Full text link
    Purpose: To develop a scan-specific model that estimates and corrects k-space errors made when reconstructing accelerated Magnetic Resonance Imaging (MRI) data. Methods: Scan-Specific Artifact Reduction in k-space (SPARK) trains a convolutional-neural-network to estimate and correct k-space errors made by an input reconstruction technique by back-propagating from the mean-squared-error loss between an auto-calibration signal (ACS) and the input technique's reconstructed ACS. First, SPARK is applied to GRAPPA and demonstrates improved robustness over other scan-specific models, such as RAKI and residual-RAKI. Subsequent experiments demonstrate that SPARK synergizes with residual-RAKI to improve reconstruction performance. SPARK also improves reconstruction quality when applied to advanced acquisition and reconstruction techniques like 2D virtual coil (VC-) GRAPPA, 2D LORAKS, 3D GRAPPA without an integrated ACS region, and 2D/3D wave-encoded images. Results: SPARK yields 1.5x - 2x RMSE reduction when applied to GRAPPA and improves robustness to ACS size for various acceleration rates in comparison to other scan-specific techniques. When applied to advanced reconstruction techniques such as residual-RAKI, 2D VC-GRAPPA and LORAKS, SPARK achieves up to 20% RMSE improvement. SPARK with 3D GRAPPA also improves performance by ~2x and perceived image quality without a fully sampled ACS region. Finally, SPARK synergizes with non-cartesian 2D and 3D wave-encoding imaging by reducing RMSE between 20-25% and providing qualitative improvements. Conclusion: SPARK synergizes with physics-based acquisition and reconstruction techniques to improve accelerated MRI by training scan-specific models to estimate and correct reconstruction errors in k-space

    CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss

    Full text link
    This paper considers contrastive training for cross-modal 0-shot transfer wherein a pre-trained model in one modality is used for representation learning in another domain using pairwise data. The learnt models in the latter domain can then be used for a diverse set of tasks in a zero-shot way, similar to ``Contrastive Language-Image Pre-training (CLIP)'' and ``Locked-image Tuning (LiT)'' that have recently gained considerable attention. Most existing works for cross-modal representation alignment (including CLIP and LiT) use the standard contrastive training objective, which employs sets of positive and negative examples to align similar and repel dissimilar training data samples. However, similarity amongst training examples has a more continuous nature, thus calling for a more `non-binary' treatment. To address this, we propose a novel loss function called Continuously Weighted Contrastive Loss (CWCL) that employs a continuous measure of similarity. With CWCL, we seek to align the embedding space of one modality with another. Owing to the continuous nature of similarity in the proposed loss function, these models outperform existing methods for 0-shot transfer across multiple models, datasets and modalities. Particularly, we consider the modality pairs of image-text and speech-text and our models achieve 5-8% (absolute) improvement over previous state-of-the-art methods in 0-shot image classification and 20-30% (absolute) improvement in 0-shot speech-to-intent classification and keyword classification.Comment: Accepted to Neural Information Processing Systems (NeurIPS) 2023 conferenc
    corecore